[SPARK-56060][PS] Handle pandas 3 null string conversion in describe() for empty timestamp frames by ueshin · Pull Request #54893 · apache/spark

ueshin · 2026-03-18T21:55:20Z

What changes were proposed in this pull request?

This PR updates pandas-on-Spark DataFrame.describe() and the related test_describe_empty expectations for empty timestamp-containing frames to handle the pandas 3 astype(str) behavior change on null values.

In pandas 2, empty timestamp stats were string-converted as "None" in the relevant describe() path. In pandas 3, astype(str) preserves those empty stats as missing values instead. This patch updates the pandas-on-Spark result construction and the corresponding test expectations to follow that behavior consistently.

Why are the changes needed?

pyspark.pandas.tests.computation.test_describe FrameDescribeTests.test_describe_empty fails with pandas 3 because pandas changed how astype(str) handles null values in empty timestamp describe() results.

Without this change, pandas-on-Spark and the pandas-based expectation disagree for empty timestamp-only and mixed timestamp frames.

Does this PR introduce any user-facing change?

Yes.

For pandas-on-Spark DataFrame.describe() on empty timestamp-containing frames, null timestamp stats now follow the pandas 3 string-conversion behavior instead of always being materialized as "None".

How was this patch tested?

Ran the related pyspark.pandas.tests.computation.test_describe tests in both pandas 2 and pandas 3 Python environments.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: OpenAI Codex (GPT-5)

…amp frames

ueshin · 2026-03-18T21:55:31Z

cc @gaogaotiantian @HyukjinKwon @zhengruifeng

HyukjinKwon · 2026-03-19T03:38:59Z

Merged to master.

…) for empty timestamp frames ### What changes were proposed in this pull request? This PR updates pandas-on-Spark `DataFrame.describe()` and the related `test_describe_empty` expectations for empty timestamp-containing frames to handle the pandas 3 `astype(str)` behavior change on null values. In pandas 2, empty timestamp stats were string-converted as `"None"` in the relevant `describe()` path. In pandas 3, `astype(str)` preserves those empty stats as missing values instead. This patch updates the pandas-on-Spark result construction and the corresponding test expectations to follow that behavior consistently. ### Why are the changes needed? `pyspark.pandas.tests.computation.test_describe FrameDescribeTests.test_describe_empty` fails with pandas 3 because pandas changed how `astype(str)` handles null values in empty timestamp `describe()` results. Without this change, pandas-on-Spark and the pandas-based expectation disagree for empty timestamp-only and mixed timestamp frames. ### Does this PR introduce _any_ user-facing change? Yes. For pandas-on-Spark `DataFrame.describe()` on empty timestamp-containing frames, null timestamp stats now follow the pandas 3 string-conversion behavior instead of always being materialized as `"None"`. ### How was this patch tested? Ran the related `pyspark.pandas.tests.computation.test_describe` tests in both pandas 2 and pandas 3 Python environments. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: OpenAI Codex (GPT-5) Closes apache#54893 from ueshin/issues/SPARK-56060/describe. Authored-by: Takuya Ueshin <ueshin@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

Handle pandas 3 null string conversion in describe() for empty timest…

afc19f6

…amp frames

HyukjinKwon approved these changes Mar 18, 2026

View reviewed changes

zhengruifeng approved these changes Mar 19, 2026

View reviewed changes

HyukjinKwon closed this in d201075 Mar 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56060][PS] Handle pandas 3 null string conversion in describe() for empty timestamp frames#54893

[SPARK-56060][PS] Handle pandas 3 null string conversion in describe() for empty timestamp frames#54893
ueshin wants to merge 1 commit intoapache:masterfrom
ueshin:issues/SPARK-56060/describe

ueshin commented Mar 18, 2026

Uh oh!

ueshin commented Mar 18, 2026

Uh oh!

HyukjinKwon commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ueshin commented Mar 18, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

ueshin commented Mar 18, 2026

Uh oh!

HyukjinKwon commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants